{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "在模型训练部分，为了保证模型的真实效果，我们需要对模型进行一些调试和优化，主要分为以下五个环节：\n",
    "\n",
    "- 计算分类准确率，观测模型训练效果。\n",
    "\n",
    "    交叉熵损失函数只能作为优化目标，无法直接准确衡量模型的训练效果。准确率可以直接衡量训练效果，但由于其离散性质，不适合做为损失函数优化神经网络。\n",
    "    \n",
    "- 检查模型训练过程，识别潜在问题。\n",
    "\n",
    "    如果模型的损失或者评估指标表现异常，我们通常需要打印模型每一层的输入和输出来定位问题，分析每一层的内容来获取错误的原因。\n",
    "    \n",
    "- 加入校验或测试，更好评价模型效果。\n",
    "\n",
    "\t理想的模型训练结果是在训练集和验证集上均有较高的准确率，如果训练集上的准确率高于验证集，说明网络训练程度不够；如果验证集的准确率高于训练集，可能是发生了\t过拟合现象。通过在优化目标中加入正则化项的办法，可以解决过拟合的问题。\n",
    "    \n",
    "- 加入正则化项，避免模型过拟合。\n",
    "\n",
    "- 可视化分析。\n",
    "\n",
    "\t用户不仅可以通过打印或使用matplotlib库作图，飞桨还集成了更专业的第三方绘图库tb-paddle，提供便捷的可视化分析。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 1. 计算模型的分类准确率\n",
    "\n",
    "飞桨提供了便利的计算分类准确率的API，使用fluid.layers.accuracy(input, label)可以直接计算准确率，该API的输入为预测的分类结果input和对应的标签label。\n",
    "\n",
    "在下述代码中，我们在模型前向计算过程forward函数中计算分类准确率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loading mnist dataset from ./work/mnist.json.gz ......\n",
      "epoch: 0, batch: 0, loss is: [2.6924028], acc is [0.13]\n",
      "epoch: 0, batch: 200, loss is: [0.43108827], acc is [0.9]\n",
      "epoch: 0, batch: 400, loss is: [0.28437912], acc is [0.91]\n",
      "epoch: 1, batch: 0, loss is: [0.18557501], acc is [0.94]\n",
      "epoch: 1, batch: 200, loss is: [0.2572117], acc is [0.92]\n",
      "epoch: 1, batch: 400, loss is: [0.23375131], acc is [0.96]\n",
      "epoch: 2, batch: 0, loss is: [0.2376423], acc is [0.92]\n",
      "epoch: 2, batch: 200, loss is: [0.2346335], acc is [0.94]\n",
      "epoch: 2, batch: 400, loss is: [0.12313923], acc is [0.96]\n",
      "epoch: 3, batch: 0, loss is: [0.11195458], acc is [0.97]\n",
      "epoch: 3, batch: 200, loss is: [0.06257623], acc is [0.99]\n",
      "epoch: 3, batch: 400, loss is: [0.09305647], acc is [0.98]\n",
      "epoch: 4, batch: 0, loss is: [0.11276975], acc is [0.96]\n",
      "epoch: 4, batch: 200, loss is: [0.11841835], acc is [0.96]\n",
      "epoch: 4, batch: 400, loss is: [0.17532296], acc is [0.93]\n"
     ]
    }
   ],
   "source": [
    "# 加载相关库\n",
    "import os\n",
    "import random\n",
    "import paddle\n",
    "import paddle.fluid as fluid\n",
    "from paddle.fluid.dygraph.nn import Conv2D, Pool2D, FC\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "import gzip\n",
    "import json\n",
    "\n",
    "# 定义数据集读取器\n",
    "def load_data(mode='train'):\n",
    "\n",
    "    # 读取数据文件\n",
    "    datafile = './work/mnist.json.gz'\n",
    "    print('loading mnist dataset from {} ......'.format(datafile))\n",
    "    data = json.load(gzip.open(datafile))\n",
    "    # 读取数据集中的训练集，验证集和测试集\n",
    "    train_set, val_set, eval_set = data\n",
    "\n",
    "    # 数据集相关参数，图片高度IMG_ROWS, 图片宽度IMG_COLS\n",
    "    IMG_ROWS = 28\n",
    "    IMG_COLS = 28\n",
    "    # 根据输入mode参数决定使用训练集，验证集还是测试\n",
    "    if mode == 'train':\n",
    "        imgs = train_set[0]\n",
    "        labels = train_set[1]\n",
    "    elif mode == 'valid':\n",
    "        imgs = val_set[0]\n",
    "        labels = val_set[1]\n",
    "    elif mode == 'eval':\n",
    "        imgs = eval_set[0]\n",
    "        labels = eval_set[1]\n",
    "    # 获得所有图像的数量\n",
    "    imgs_length = len(imgs)\n",
    "    # 验证图像数量和标签数量是否一致\n",
    "    assert len(imgs) == len(labels), \\\n",
    "          \"length of train_imgs({}) should be the same as train_labels({})\".format(\n",
    "                  len(imgs), len(labels))\n",
    "\n",
    "    index_list = list(range(imgs_length))\n",
    "\n",
    "    # 读入数据时用到的batchsize\n",
    "    BATCHSIZE = 100\n",
    "\n",
    "    # 定义数据生成器\n",
    "    def data_generator():\n",
    "        # 训练模式下，打乱训练数据\n",
    "        if mode == 'train':\n",
    "            random.shuffle(index_list)\n",
    "        imgs_list = []\n",
    "        labels_list = []\n",
    "        # 按照索引读取数据\n",
    "        for i in index_list:\n",
    "            # 读取图像和标签，转换其尺寸和类型\n",
    "            img = np.reshape(imgs[i], [1, IMG_ROWS, IMG_COLS]).astype('float32')\n",
    "            label = np.reshape(labels[i], [1]).astype('int64')\n",
    "            imgs_list.append(img) \n",
    "            labels_list.append(label)\n",
    "            # 如果当前数据缓存达到了batch size，就返回一个批次数据\n",
    "            if len(imgs_list) == BATCHSIZE:\n",
    "                yield np.array(imgs_list), np.array(labels_list)\n",
    "                # 清空数据缓存列表\n",
    "                imgs_list = []\n",
    "                labels_list = []\n",
    "\n",
    "        # 如果剩余数据的数目小于BATCHSIZE，\n",
    "        # 则剩余数据一起构成一个大小为len(imgs_list)的mini-batch\n",
    "        if len(imgs_list) > 0:\n",
    "            yield np.array(imgs_list), np.array(labels_list)\n",
    "\n",
    "    return data_generator\n",
    "\n",
    "\n",
    "# 定义模型结构\n",
    "class MNIST(fluid.dygraph.Layer):\n",
    "     def __init__(self, name_scope):\n",
    "         super(MNIST, self).__init__(name_scope)\n",
    "         name_scope = self.full_name()\n",
    "         # 定义卷积层，输出通道20，卷积核大小为5，步长为1，padding为2，使用relu激活函数\n",
    "         self.conv1 = Conv2D(name_scope, num_filters=20, filter_size=5, stride=1, padding=2, act='relu')\n",
    "         # 定义池化层，池化核为2，采用最大池化方式\n",
    "         self.pool1 = Pool2D(name_scope, pool_size=2, pool_stride=2, pool_type='max')\n",
    "         # 定义卷积层，输出通道20，卷积核大小为5，步长为1，padding为2，使用relu激活函数\n",
    "         self.conv2 = Conv2D(name_scope, num_filters=20, filter_size=5, stride=1, padding=2, act='relu')\n",
    "         # 定义池化层，池化核为2，采用最大池化方式\n",
    "         self.pool2 = Pool2D(name_scope, pool_size=2, pool_stride=2, pool_type='max')\n",
    "         # 定义全连接层，输出节点数为10，激活函数使用softmax\n",
    "         self.fc = FC(name_scope, size=10, act='softmax')\n",
    "         \n",
    "    # 定义网络的前向计算过程\n",
    "     def forward(self, inputs, label=None):\n",
    "         x = self.conv1(inputs)\n",
    "         x = self.pool1(x)\n",
    "         x = self.conv2(x)\n",
    "         x = self.pool2(x)\n",
    "         x = self.fc(x)\n",
    "         if label is not None:\n",
    "             acc = fluid.layers.accuracy(input=x, label=label)\n",
    "             return x, acc\n",
    "         else:\n",
    "             return x\n",
    "\n",
    "#调用加载数据的函数\n",
    "train_loader = load_data('train')\n",
    "    \n",
    "#在使用GPU机器时，可以将use_gpu变量设置成True\n",
    "use_gpu = False\n",
    "place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()\n",
    "\n",
    "with fluid.dygraph.guard(place):\n",
    "    model = MNIST(\"mnist\")\n",
    "    model.train() \n",
    "    \n",
    "    #四种优化算法的设置方案，可以逐一尝试效果\n",
    "    optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdagradOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.01)\n",
    "    \n",
    "    EPOCH_NUM = 5\n",
    "    for epoch_id in range(EPOCH_NUM):\n",
    "        for batch_id, data in enumerate(train_loader()):\n",
    "            #准备数据\n",
    "            image_data, label_data = data\n",
    "            image = fluid.dygraph.to_variable(image_data)\n",
    "            label = fluid.dygraph.to_variable(label_data)\n",
    "            \n",
    "            #前向计算的过程，同时拿到模型输出值和分类准确率\n",
    "            predict, avg_acc = model(image, label)\n",
    "            \n",
    "            #计算损失，取一个批次样本损失的平均值\n",
    "            loss = fluid.layers.cross_entropy(predict, label)\n",
    "            avg_loss = fluid.layers.mean(loss)\n",
    "            \n",
    "            #每训练了200批次的数据，打印下当前Loss的情况\n",
    "            if batch_id % 200 == 0:\n",
    "                print(\"epoch: {}, batch: {}, loss is: {}, acc is {}\".format(epoch_id, batch_id, avg_loss.numpy(),avg_acc.numpy()))\n",
    "            \n",
    "            #后向传播，更新参数的过程\n",
    "            avg_loss.backward()\n",
    "            optimizer.minimize(avg_loss)\n",
    "            model.clear_gradients()\n",
    "\n",
    "    #保存模型参数\n",
    "    fluid.save_dygraph(model.state_dict(), 'mnist')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 2. 检查模型训练过程，识别潜在训练问题\n",
    "\n",
    "不同于某些深度学习框架的高层API，使用飞桨动态图编程可以方便的查看和调试训练的执行过程。在网络定义的Forward函数中，可以打印每一层输入输出的尺寸，以及每层网络的参数。通过查看这些信息，不仅可以让我们可以更好的理解训练的执行过程，还可以发现潜在问题，或者启发继续优化的思路。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "########## print network layer's superparams ##############\n",
      "conv1-- kernel_size:[20, 1, 5, 5], padding:[2, 2], stride:[1, 1]\n",
      "conv2-- kernel_size:[20, 20, 5, 5], padding:[2, 2], stride:[1, 1]\n",
      "pool1-- pool_type:max, pool_size:[2, 2], pool_stride:[2, 2]\n",
      "pool2-- pool_type:max, poo2_size:[2, 2], pool_stride:[2, 2]\n",
      "fc-- weight_size:[980, 10], bias_size_[10], activation:softmax\n",
      "\n",
      "########## print shape of features of every layer ###############\n",
      "inputs_shape: [100, 1, 28, 28]\n",
      "outputs1_shape: [100, 20, 28, 28]\n",
      "outputs2_shape: [100, 20, 14, 14]\n",
      "outputs3_shape: [100, 20, 14, 14]\n",
      "outputs4_shape: [100, 20, 7, 7]\n",
      "outputs5_shape: [100, 10]\n",
      "epoch: 0, batch: 0, loss is: [2.3864117], acc is [0.06]\n",
      "epoch: 0, batch: 200, loss is: [0.28423238], acc is [0.89]\n",
      "epoch: 0, batch: 400, loss is: [0.25175756], acc is [0.94]\n",
      "\n",
      "########## print convolution layer's kernel ###############\n",
      "conv1 params -- kernel weights: name tmp_7629, dtype: VarType.FP32 shape: [5, 5] \tlod: {}\n",
      "\tdim: 5, 5\n",
      "\tlayout: NCHW\n",
      "\tdtype: float\n",
      "\tdata: [-0.0527429 -0.19878 -0.0240543 -0.384433 0.0400837 -0.187233 -0.0848855 -0.436887 -0.249054 -0.454552 0.308293 -0.27811 0.00470069 0.473009 -0.0133494 0.384816 -0.277996 -0.256182 0.208347 0.302825 -0.0483521 0.111671 0.0344227 0.361947 0.109431]\n",
      "\n",
      "conv2 params -- kernel weights: name tmp_7631, dtype: VarType.FP32 shape: [5, 5] \tlod: {}\n",
      "\tdim: 5, 5\n",
      "\tlayout: NCHW\n",
      "\tdtype: float\n",
      "\tdata: [0.0262879 -0.0188528 0.0815239 0.0503074 0.020773 -0.0410616 -0.0138823 -0.0522185 0.000973676 0.111355 -0.0340131 0.0308802 -0.0204868 0.0131735 -0.00594701 0.0646147 -0.00726172 -0.12028 -0.0293041 0.138257 0.00227788 -0.0612485 0.0934757 -0.0661552 -0.0567356]\n",
      "\n",
      "\n",
      "The 9th channel of conv1 layer:  name tmp_7633, dtype: VarType.FP32 shape: [28, 28] \tlod: {}\n",
      "\tdim: 28, 28\n",
      "\tlayout: NCHW\n",
      "\tdtype: float\n",
      "\tdata: [0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.11756 -0.472912 -0.524717 -0.110455 0.170891 0.155462 0.0950409 0.0311853 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.00305192 -0.301301 -0.858069 -1.08936 -0.946153 -0.440379 -0.443265 -0.277586 -0.0145415 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.00777388 -0.0445894 -0.000610448 0.0369207 0.0281302 0.0274145 0.0181969 -0.171599 -0.640565 -0.966732 -1.16112 -1.28228 -1.47838 -0.961951 -0.620457 -0.118071 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0158359 -0.188922 -0.540736 -0.265408 0.13675 0.0507877 0.0566544 0.0192642 -0.244656 -0.986818 -1.38519 -1.58696 -1.75055 -1.25026 -0.938822 -0.83676 -0.357201 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.0377816 -0.405913 -0.946643 -0.781328 -0.22868 -0.318338 -0.343631 -0.027515 -0.425642 -1.11074 -1.33324 -1.89703 -1.66673 -1.46485 -1.103 -0.958084 -0.476661 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.149594 -0.672558 -1.06833 -1.14039 -0.923044 -0.806235 -0.607837 -0.138629 -0.722816 -1.20346 -1.60937 -1.55411 -1.98643 -1.35546 -1.10131 -0.784337 -0.331305 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.179543 -0.896746 -1.3534 -1.4738 -1.39283 -0.897352 -0.821637 -0.728636 -1.01366 -1.22779 -1.62143 -1.77813 -1.82903 -1.39189 -1.05181 -0.729918 -0.243456 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.251226 -0.914772 -1.41565 -1.63522 -1.30136 -1.14017 -1.24057 -1.05349 -1.33201 -1.48766 -1.72368 -1.78154 -1.45738 -1.29806 -0.87306 -0.787796 -0.0738183 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.0183983 -0.468854 -1.0788 -1.43129 -1.54647 -1.49108 -1.3252 -1.12086 -1.62322 -1.09352 -1.53387 -1.73499 -2.03787 -1.43796 -1.14988 -0.99579 -0.255096 0.0523104 0.0725252 0.0487826 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.219678 -0.786299 -1.16232 -1.51743 -1.67102 -2.10665 -1.87273 -1.61025 -1.81182 -1.30948 -1.74135 -1.84965 -2.33772 -2.09153 -1.56405 -1.25072 -0.482573 0.0210531 -0.163131 -0.100295 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.283025 -1.06228 -1.37236 -1.63572 -2.15183 -2.26769 -2.3018 -2.2099 -2.27168 -2.0452 -2.38453 -2.26506 -2.35589 -2.33514 -2.19016 -2.17966 -1.01354 -0.719112 -0.657559 -0.206499 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.337114 -1.09303 -1.44159 -1.91842 -2.10403 -2.40854 -2.32859 -2.3856 -2.71474 -2.58 -2.19456 -2.4633 -2.49826 -2.30228 -2.39292 -2.20541 -1.61909 -0.879525 -0.691607 -0.280286 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.458916 -1.10639 -1.60525 -1.4838 -2.34837 -2.61089 -2.50356 -2.58135 -2.3938 -2.2834 -2.29151 -2.08444 -2.11706 -2.13148 -2.20169 -1.90325 -1.45948 -1.01719 -0.919075 -0.346697 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.167191 -0.749538 -1.53401 -1.60811 -2.17618 -2.10022 -2.37093 -2.73715 -2.58716 -2.56452 -2.17666 -1.65175 -1.48351 -1.37417 -1.14347 -0.900918 -0.717453 -0.610158 -0.517472 -0.178407 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.100036 -0.235517 -0.556738 -0.946388 -1.3555 -1.74807 -2.09176 -2.27371 -2.19904 -1.99594 -1.88617 -1.51152 -1.32094 -1.00768 -0.387355 -0.220935 -0.0866746 0.0176327 -0.130734 -0.0858562 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.065868 -0.24598 -0.282678 0.0352375 -0.385841 -0.907669 -1.53144 -1.6149 -1.61007 -1.99227 -1.56821 -1.19555 -0.890975 -0.532596 -0.273377 -0.236502 -0.192296 -0.210495 -0.134687 -0.0112032 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0970478 -0.108064 -0.100759 -0.440874 -0.0410803 -0.306086 -1.08229 -1.57573 -2.09683 -1.81027 -1.50375 -1.09724 -0.723585 -0.483745 -0.0491462 -0.0304231 -0.026034 -0.0343797 -0.0103376 0.0153979 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0219053 0.0856821 0.0451348 0.0884194 -0.216727 -0.84588 -1.44015 -1.69111 -1.48111 -1.79456 -1.41799 -1.0516 -0.957818 -0.344511 0.0147394 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.147172 -0.752489 -1.28073 -1.56315 -1.66953 -1.52212 -1.3382 -0.951877 -0.87979 -0.109301 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.0584733 -0.642842 -1.38066 -1.72477 -1.46364 -1.38526 -1.12154 -0.865802 -0.313515 -0.0507344 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.0659831 -0.182903 -0.631622 -0.993449 -1.14353 -1.46018 -1.124 -0.60987 -0.289162 0.00617774 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 -0.0511393 -0.170603 -0.269663 -0.0526375 -0.193566 -0.590656 -0.286179 -0.52746 -0.0372443 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0515126 -0.103396 -0.188025 -0.345909 0.0161284 0.142517 0.0415473 -0.216814 0.0134222 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0268498 0.0954517 0.0459056 0.0897445 -0.188241 -0.214549 -0.278775 -0.0153909 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969 0.0181969]\n",
      "\n",
      "The 8th channel of conv2 layer:  name tmp_7635, dtype: VarType.FP32 shape: [14, 14] \tlod: {}\n",
      "\tdim: 14, 14\n",
      "\tlayout: NCHW\n",
      "\tdtype: float\n",
      "\tdata: [-0.0019942 0.0188036 0.0373049 0.0151387 0.00785978 0.24082 0.397191 0.078314 -0.590374 -0.51675 -0.622034 -0.575276 -0.198225 -0.00227077 -0.0162918 0.191446 0.351654 0.0844442 -0.132191 0.0467417 -0.351591 -0.881142 -1.15884 -1.06847 -1.1999 -1.29996 -0.424855 -0.080155 -0.0397741 0.155728 0.047623 -0.42589 -0.732288 -0.764239 -1.40332 -1.59918 -1.39671 -1.70829 -2.15471 -1.07202 -0.122769 0.0341075 -0.018346 0.0395736 -0.302297 -0.512512 -1.1487 -1.31622 -2.11096 -1.34954 0.00876836 -0.335566 -1.20016 -0.123448 0.670965 0.187807 0.138644 0.178817 -0.572237 -0.304277 -0.725802 -1.61325 -2.77149 -1.04818 1.61363 2.28678 1.15907 1.10344 0.660855 0.11232 0.11721 0.0609466 -0.890022 -0.974328 0.750342 -0.117433 -1.97094 -0.282768 2.18559 2.88298 1.92797 1.03616 -0.526508 -0.43351 -0.122228 -0.263404 -1.06843 -1.12632 1.65188 1.47044 -0.492995 0.851342 3.10747 2.64009 1.02116 -0.125388 -0.109848 0.0985759 0.0331164 -0.443193 -0.542076 0.252782 2.88206 3.02155 2.12192 4.20215 5.42964 3.36522 1.40146 0.570531 0.739502 0.655618 0.0922297 0.0500337 0.0520436 2.23766 4.52819 4.127 4.72267 6.62733 5.4981 5.01179 5.95312 4.01218 1.95425 0.956061 -0.0150926 -0.140758 -0.0922132 2.17377 4.09847 5.48601 6.31862 5.54978 4.47569 5.02114 5.11304 3.37309 1.14367 0.279345 -0.0276775 -0.246963 -0.011141 1.18587 2.34113 2.78772 3.98913 4.57961 3.40079 2.9081 1.51184 0.393874 -0.150948 -0.0795593 -0.0562561 0.0206475 0.602201 0.944919 0.422149 0.627431 2.71001 4.05958 2.45138 2.00511 0.575191 -0.20492 -0.228816 -0.123965 -0.0221522 0.0074671 0.0703065 0.205022 -0.378513 0.835829 4.49346 4.50161 2.69967 1.29834 0.439264 -0.115154 -0.0432795 -0.0381728 -0.0301158 -0.0369177 -0.0677214 -0.294625 -0.157447 1.09286 3.34236 2.30212 0.881401 -0.0167041 0.0787346 -0.0865486 -0.0557189 -0.042451]\n",
      "\n",
      "The output of last layer: name tmp_7636, dtype: VarType.FP32 shape: [10] \tlod: {}\n",
      "\tdim: 10\n",
      "\tlayout: NCHW\n",
      "\tdtype: float\n",
      "\tdata: [2.84775e-06 3.10967e-07 7.30589e-06 4.80264e-06 0.996892 4.97275e-06 0.000271722 1.9646e-05 0.00101776 0.00177867]\n",
      " \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 定义模型结构\n",
    "class MNIST(fluid.dygraph.Layer):\n",
    "     def __init__(self, name_scope):\n",
    "         super(MNIST, self).__init__(name_scope)\n",
    "         name_scope = self.full_name()\n",
    "         self.conv1 = Conv2D(name_scope, num_filters=20, filter_size=5, stride=1, padding=2)\n",
    "         self.pool1 = Pool2D(name_scope, pool_size=2, pool_stride=2, pool_type='max')\n",
    "         self.conv2 = Conv2D(name_scope, num_filters=20, filter_size=5, stride=1, padding=2)\n",
    "         self.pool2 = Pool2D(name_scope, pool_size=2, pool_stride=2, pool_type='max')\n",
    "         self.fc = FC(name_scope, size=10, act='softmax')\n",
    "     \n",
    "     #加入对每一层输入和输出的尺寸和数据内容的打印，根据check参数决策是否打印每层的参数和输出尺寸\n",
    "     def forward(self, inputs, label=None, check_shape=False, check_content=False):\n",
    "         # 给不同层的输出不同命名，方便调试\n",
    "         outputs1 = self.conv1(inputs)\n",
    "         outputs2 = self.pool1(outputs1)\n",
    "         outputs3 = self.conv2(outputs2)\n",
    "         outputs4 = self.pool2(outputs3)\n",
    "         outputs5 = self.fc(outputs4)\n",
    "         \n",
    "         # 选择是否打印神经网络每层的参数尺寸和输出尺寸，验证网络结构是否设置正确\n",
    "         if check_shape:\n",
    "             # 打印每层网络设置的超参数-卷积核尺寸，卷积步长，卷积padding，池化核尺寸\n",
    "             print(\"\\n########## print network layer's superparams ##############\")\n",
    "             print(\"conv1-- kernel_size:{}, padding:{}, stride:{}\".format(self.conv1.weight.shape, self.conv1._padding, self.conv1._stride))\n",
    "             print(\"conv2-- kernel_size:{}, padding:{}, stride:{}\".format(self.conv2.weight.shape, self.conv2._padding, self.conv2._stride))\n",
    "             print(\"pool1-- pool_type:{}, pool_size:{}, pool_stride:{}\".format(self.pool1._pool_type, self.pool1._pool_size, self.pool1._pool_stride))\n",
    "             print(\"pool2-- pool_type:{}, poo2_size:{}, pool_stride:{}\".format(self.pool2._pool_type, self.pool2._pool_size, self.pool2._pool_stride))\n",
    "             print(\"fc-- weight_size:{}, bias_size_{}, activation:{}\".format(self.fc.weight.shape, self.fc.bias.shape, self.fc._act))\n",
    "             \n",
    "             # 打印每层的输出尺寸\n",
    "             print(\"\\n########## print shape of features of every layer ###############\")\n",
    "             print(\"inputs_shape: {}\".format(inputs.shape))\n",
    "             print(\"outputs1_shape: {}\".format(outputs1.shape))\n",
    "             print(\"outputs2_shape: {}\".format(outputs2.shape))\n",
    "             print(\"outputs3_shape: {}\".format(outputs3.shape))\n",
    "             print(\"outputs4_shape: {}\".format(outputs4.shape))\n",
    "             print(\"outputs5_shape: {}\".format(outputs5.shape))\n",
    "             \n",
    "         # 选择是否打印训练过程中的参数和输出内容，可用于训练过程中的调试\n",
    "         if check_content:\n",
    "            # 打印卷积层的参数-卷积核权重，权重参数较多，此处只打印部分参数\n",
    "             print(\"\\n########## print convolution layer's kernel ###############\")\n",
    "             print(\"conv1 params -- kernel weights:\", self.conv1.weight[0][0])\n",
    "             print(\"conv2 params -- kernel weights:\", self.conv2.weight[0][0])\n",
    "\n",
    "             # 创建随机数，随机打印某一个通道的输出值\n",
    "             idx1 = np.random.randint(0, outputs1.shape[1])\n",
    "             idx2 = np.random.randint(0, outputs3.shape[1])\n",
    "             # 打印卷积-池化后的结果，仅打印batch中第一个图像对应的特征\n",
    "             print(\"\\nThe {}th channel of conv1 layer: \".format(idx1), outputs1[0][idx1])\n",
    "             print(\"The {}th channel of conv2 layer: \".format(idx2), outputs3[0][idx2])\n",
    "             print(\"The output of last layer:\", outputs5[0], '\\n')\n",
    "            \n",
    "        # 如果label不是None，则计算分类精度并返回\n",
    "         if label is not None:\n",
    "             acc = fluid.layers.accuracy(input=outputs5, label=label)\n",
    "             return outputs5, acc\n",
    "         else:\n",
    "             return outputs5\n",
    "\n",
    "    \n",
    "#在使用GPU机器时，可以将use_gpu变量设置成True\n",
    "use_gpu = False\n",
    "place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()\n",
    "\n",
    "with fluid.dygraph.guard(place):\n",
    "    model = MNIST(\"mnist\")\n",
    "    model.train() \n",
    "    \n",
    "    #四种优化算法的设置方案，可以逐一尝试效果\n",
    "    optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdagradOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.01)\n",
    "    \n",
    "    EPOCH_NUM = 1\n",
    "    for epoch_id in range(EPOCH_NUM):\n",
    "        for batch_id, data in enumerate(train_loader()):\n",
    "            #准备数据，变得更加简洁\n",
    "            image_data, label_data = data\n",
    "            image = fluid.dygraph.to_variable(image_data)\n",
    "            label = fluid.dygraph.to_variable(label_data)\n",
    "            \n",
    "            #前向计算的过程，同时拿到模型输出值和分类准确率\n",
    "            if batch_id == 0 and epoch_id==0:\n",
    "                # 打印模型参数和每层输出的尺寸\n",
    "                predict, acc = model(image, label, check_shape=True, check_content=False)\n",
    "            elif batch_id==401:\n",
    "                # 打印模型参数和每层输出的值\n",
    "                predict, avg_acc = model(image, label, check_shape=False, check_content=True)\n",
    "            else:\n",
    "                predict, avg_acc = model(image, label)\n",
    "            \n",
    "            #计算损失，取一个批次样本损失的平均值\n",
    "            loss = fluid.layers.cross_entropy(predict, label)\n",
    "            avg_loss = fluid.layers.mean(loss)\n",
    "            \n",
    "            #每训练了100批次的数据，打印下当前Loss的情况\n",
    "            if batch_id % 200 == 0:\n",
    "                print(\"epoch: {}, batch: {}, loss is: {}, acc is {}\".format(epoch_id, batch_id, avg_loss.numpy(),avg_acc.numpy()))\n",
    "            \n",
    "            #后向传播，更新参数的过程\n",
    "            avg_loss.backward()\n",
    "            optimizer.minimize(avg_loss)\n",
    "            model.clear_gradients()\n",
    "\n",
    "    #保存模型参数\n",
    "    fluid.save_dygraph(model.state_dict(), 'mnist')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 3. 加入校验或测试，更好评价模型效果 \n",
    "\n",
    "在训练过程中，我们会发现模型在训练样本集上的损失在不断减小。但这是否代表模型在未来的应用场景上依然有效？为了验证模型的有效性，通常将样本集合分成三份，训练集、校验集和测试集。\n",
    "\n",
    "- **训练集**。用于训练模型的参数，即训练过程中主要完成的工作。\n",
    "- **校验集**。用于对模型超参数的选择，比如网络结构的调整、正则化项权重的选择等。\n",
    "- **测试集**。用于模拟模型在应用后的真实效果。因为测试集没有参与任何模型优化或参数训练的工作，所以它对模型来说是完全未知的样本。在不以校验数据优化网络结构或模型超参数时，校验数据和测试数据的效果是类似的，均更真实的反映模型效果。\n",
    "\n",
    "如下程序读取上一步训练保存的模型参数，读取校验数据集，测试模型在校验数据集上的效果。从测试的效果来看，模型在从来没有见过的数据集上依然有？%的准确率，证明它是有预测效果的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "start evaluation .......\n",
      "loading mnist dataset from ./work/mnist.json.gz ......\n",
      "loss=0.09713039316236972, acc=0.9705000025033951\n"
     ]
    }
   ],
   "source": [
    "with fluid.dygraph.guard():\n",
    "    print('start evaluation .......')\n",
    "    #加载模型参数\n",
    "    model = MNIST(\"mnist\")\n",
    "    model_state_dict, _ = fluid.load_dygraph('mnist')\n",
    "    model.load_dict(model_state_dict)\n",
    "\n",
    "    model.eval()\n",
    "    eval_loader = load_data('eval')\n",
    "\n",
    "    acc_set = []\n",
    "    avg_loss_set = []\n",
    "    for batch_id, data in enumerate(eval_loader()):\n",
    "        x_data, y_data = data\n",
    "        img = fluid.dygraph.to_variable(x_data)\n",
    "        label = fluid.dygraph.to_variable(y_data)\n",
    "        prediction, acc = model(img, label)\n",
    "        loss = fluid.layers.cross_entropy(input=prediction, label=label)\n",
    "        avg_loss = fluid.layers.mean(loss)\n",
    "        acc_set.append(float(acc.numpy()))\n",
    "        avg_loss_set.append(float(avg_loss.numpy()))\n",
    "    \n",
    "    #计算多个batch的平均损失和准确率\n",
    "    acc_val_mean = np.array(acc_set).mean()\n",
    "    avg_loss_val_mean = np.array(avg_loss_set).mean()\n",
    "\n",
    "    print('loss={}, acc={}'.format(avg_loss_val_mean, acc_val_mean))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 4. 加入正则化项，避免模型过拟合\n",
    "   \n",
    "对于样本量有限、但需要使用强大模型的复杂任务，模型很容易出现过拟合的表现，即在训练集上的损失小，在校验集或测试集上的损失较大。关于过拟合的详细理论，参考《机器学习的思考故事》课程。\n",
    "\n",
    "为了避免模型过拟合，在没有扩充样本量的可能下，只能降低模型的复杂度。降低模型的复杂度，可以通过限制参数的数量或可能取值（参数值尽量小）实现。具体来说，在模型的优化目标（损失）中人为加入对参数规模的惩罚项。当参数越多或取值越大时，该惩罚项就越大。通过调整惩罚项的权重系数，可以使模型在“尽量减少训练损失”和“保持模型的泛化能力”之间取得平衡。泛化能力表示模型在没有见过的样本上依然有效。正则化项的存在，增加了模型在训练集上的损失。\n",
    "\n",
    "飞桨框架支持为所有参数加上统一的正则化项，也支持为特定的参数添加正则化项。前者的实现如下代码所示，仅在优化器中设置regularization参数即可实现。使用参数regularization_coeff调节正则化项的权重，权重越大时，对模型复杂度的惩罚越高。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch: 0, batch: 0, loss is: [2.5719237], acc is [0.2]\n",
      "epoch: 0, batch: 100, loss is: [0.2886358], acc is [0.93]\n",
      "epoch: 0, batch: 200, loss is: [0.3907683], acc is [0.91]\n",
      "epoch: 0, batch: 300, loss is: [0.34717605], acc is [0.91]\n",
      "epoch: 0, batch: 400, loss is: [0.29703715], acc is [0.92]\n",
      "epoch: 1, batch: 0, loss is: [0.35397568], acc is [0.88]\n",
      "epoch: 1, batch: 100, loss is: [0.3475823], acc is [0.89]\n",
      "epoch: 1, batch: 200, loss is: [0.31529555], acc is [0.94]\n",
      "epoch: 1, batch: 300, loss is: [0.4401128], acc is [0.86]\n",
      "epoch: 1, batch: 400, loss is: [0.43345112], acc is [0.88]\n",
      "epoch: 2, batch: 0, loss is: [0.30035377], acc is [0.93]\n",
      "epoch: 2, batch: 100, loss is: [0.19205013], acc is [0.96]\n",
      "epoch: 2, batch: 200, loss is: [0.35591784], acc is [0.87]\n",
      "epoch: 2, batch: 300, loss is: [0.2924868], acc is [0.9]\n",
      "epoch: 2, batch: 400, loss is: [0.26790595], acc is [0.95]\n",
      "epoch: 3, batch: 0, loss is: [0.23955241], acc is [0.94]\n",
      "epoch: 3, batch: 100, loss is: [0.24024098], acc is [0.92]\n",
      "epoch: 3, batch: 200, loss is: [0.32930472], acc is [0.9]\n",
      "epoch: 3, batch: 300, loss is: [0.31239623], acc is [0.92]\n",
      "epoch: 3, batch: 400, loss is: [0.39592034], acc is [0.88]\n",
      "epoch: 4, batch: 0, loss is: [0.28901288], acc is [0.91]\n",
      "epoch: 4, batch: 100, loss is: [0.3212445], acc is [0.91]\n",
      "epoch: 4, batch: 200, loss is: [0.34330502], acc is [0.93]\n",
      "epoch: 4, batch: 300, loss is: [0.24487075], acc is [0.95]\n",
      "epoch: 4, batch: 400, loss is: [0.34919304], acc is [0.94]\n",
      "epoch: 5, batch: 0, loss is: [0.328629], acc is [0.94]\n",
      "epoch: 5, batch: 100, loss is: [0.3838482], acc is [0.91]\n",
      "epoch: 5, batch: 200, loss is: [0.33866328], acc is [0.91]\n",
      "epoch: 5, batch: 300, loss is: [0.24536705], acc is [0.95]\n",
      "epoch: 5, batch: 400, loss is: [0.3349511], acc is [0.9]\n",
      "epoch: 6, batch: 0, loss is: [0.2786575], acc is [0.95]\n",
      "epoch: 6, batch: 100, loss is: [0.32455498], acc is [0.89]\n",
      "epoch: 6, batch: 200, loss is: [0.27402192], acc is [0.92]\n",
      "epoch: 6, batch: 300, loss is: [0.34056935], acc is [0.92]\n",
      "epoch: 6, batch: 400, loss is: [0.42240426], acc is [0.89]\n",
      "epoch: 7, batch: 0, loss is: [0.4185967], acc is [0.9]\n",
      "epoch: 7, batch: 100, loss is: [0.2715359], acc is [0.93]\n",
      "epoch: 7, batch: 200, loss is: [0.40134203], acc is [0.85]\n",
      "epoch: 7, batch: 300, loss is: [0.26778865], acc is [0.93]\n",
      "epoch: 7, batch: 400, loss is: [0.35352013], acc is [0.91]\n",
      "epoch: 8, batch: 0, loss is: [0.3710443], acc is [0.9]\n",
      "epoch: 8, batch: 100, loss is: [0.2654761], acc is [0.94]\n",
      "epoch: 8, batch: 200, loss is: [0.30841744], acc is [0.92]\n",
      "epoch: 8, batch: 300, loss is: [0.31274542], acc is [0.93]\n",
      "epoch: 8, batch: 400, loss is: [0.44749957], acc is [0.9]\n",
      "epoch: 9, batch: 0, loss is: [0.35223156], acc is [0.91]\n",
      "epoch: 9, batch: 100, loss is: [0.34217975], acc is [0.92]\n",
      "epoch: 9, batch: 200, loss is: [0.29989934], acc is [0.92]\n",
      "epoch: 9, batch: 300, loss is: [0.32098228], acc is [0.92]\n",
      "epoch: 9, batch: 400, loss is: [0.35757118], acc is [0.91]\n"
     ]
    }
   ],
   "source": [
    "with fluid.dygraph.guard():\n",
    "    model = MNIST(\"mnist\")\n",
    "    model.train() \n",
    "    \n",
    "    #四种优化算法的设置方案，可以逐一尝试效果\n",
    "    #optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdagradOptimizer(learning_rate=0.01)\n",
    "    #optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.01)\n",
    "    \n",
    "    #各种优化算法均可以加入正则化项，避免过拟合，参数regularization_coeff调节正则化项的权重\n",
    "    #optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01, regularization=fluid.regularizer.L2Decay(regularization_coeff=0.1))\n",
    "    optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.01, regularization=fluid.regularizer.L2Decay(regularization_coeff=0.1))\n",
    "    \n",
    "    EPOCH_NUM = 10\n",
    "    for epoch_id in range(EPOCH_NUM):\n",
    "        for batch_id, data in enumerate(train_loader()):\n",
    "            #准备数据，变得更加简洁\n",
    "            image_data, label_data = data\n",
    "            image = fluid.dygraph.to_variable(image_data)\n",
    "            label = fluid.dygraph.to_variable(label_data)\n",
    "            \n",
    "            #前向计算的过程，同时拿到模型输出值和分类准确率\n",
    "            predict, avg_acc = model(image, label)\n",
    "\n",
    "            #计算损失，取一个批次样本损失的平均值\n",
    "            loss = fluid.layers.cross_entropy(predict, label)\n",
    "            avg_loss = fluid.layers.mean(loss)\n",
    "            \n",
    "            #每训练了100批次的数据，打印下当前Loss的情况\n",
    "            if batch_id % 100 == 0:\n",
    "                print(\"epoch: {}, batch: {}, loss is: {}, acc is {}\".format(epoch_id, batch_id, avg_loss.numpy(),avg_acc.numpy()))\n",
    "            \n",
    "            #后向传播，更新参数的过程\n",
    "            avg_loss.backward()\n",
    "            optimizer.minimize(avg_loss)\n",
    "            model.clear_gradients()\n",
    "\n",
    "    #保存模型参数\n",
    "    fluid.save_dygraph(model.state_dict(), 'mnist')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 5. 可视化分析\n",
    "\n",
    "训练模型时，我们经常需要观察模型的评价指标，分析模型的优化过程，以确保训练是有效的。如之前的案例所示，使用轻量级的PLT库作图各种指标是非常简单的。\n",
    "\n",
    "### 使用Matplotlib库画出损失随训练下降的曲线图\n",
    "\n",
    "首先将训练的批次编号作为X轴坐标，该批次的训练损失作为Y轴坐标。使用两个列表变量存储对应的批次编号(iters=[])和训练损失(losses=[])，并将两份数据以参数形式导入PLT的横纵坐标（    plt.xlabel(\"iter\", fontsize=14)，plt.ylabel(\"loss\", fontsize=14)）。最后，调用plt.plot()函数即可完成作图。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch: 0, batch: 0, loss is: [2.8867545], acc is [0.04]\n",
      "epoch: 0, batch: 100, loss is: [0.81941694], acc is [0.81]\n",
      "epoch: 0, batch: 200, loss is: [0.4344087], acc is [0.89]\n",
      "epoch: 0, batch: 300, loss is: [0.24528305], acc is [0.93]\n",
      "epoch: 0, batch: 400, loss is: [0.22761887], acc is [0.94]\n",
      "epoch: 1, batch: 0, loss is: [0.21595153], acc is [0.93]\n",
      "epoch: 1, batch: 100, loss is: [0.21793167], acc is [0.96]\n",
      "epoch: 1, batch: 200, loss is: [0.30391282], acc is [0.9]\n",
      "epoch: 1, batch: 300, loss is: [0.18076742], acc is [0.96]\n",
      "epoch: 1, batch: 400, loss is: [0.12867425], acc is [0.97]\n",
      "epoch: 2, batch: 0, loss is: [0.12843314], acc is [0.97]\n",
      "epoch: 2, batch: 100, loss is: [0.10542139], acc is [0.97]\n",
      "epoch: 2, batch: 200, loss is: [0.14310229], acc is [0.96]\n",
      "epoch: 2, batch: 300, loss is: [0.17816946], acc is [0.94]\n",
      "epoch: 2, batch: 400, loss is: [0.05590251], acc is [0.98]\n",
      "epoch: 3, batch: 0, loss is: [0.15483671], acc is [0.97]\n",
      "epoch: 3, batch: 100, loss is: [0.1392599], acc is [0.96]\n",
      "epoch: 3, batch: 200, loss is: [0.19217391], acc is [0.93]\n",
      "epoch: 3, batch: 300, loss is: [0.2230981], acc is [0.96]\n",
      "epoch: 3, batch: 400, loss is: [0.06703009], acc is [0.98]\n",
      "epoch: 4, batch: 0, loss is: [0.11380026], acc is [0.98]\n",
      "epoch: 4, batch: 100, loss is: [0.07314029], acc is [0.98]\n",
      "epoch: 4, batch: 200, loss is: [0.08654656], acc is [0.96]\n",
      "epoch: 4, batch: 300, loss is: [0.12873845], acc is [0.97]\n",
      "epoch: 4, batch: 400, loss is: [0.10113725], acc is [0.98]\n",
      "epoch: 5, batch: 0, loss is: [0.11890317], acc is [0.97]\n",
      "epoch: 5, batch: 100, loss is: [0.0864516], acc is [0.99]\n",
      "epoch: 5, batch: 200, loss is: [0.03945278], acc is [0.99]\n",
      "epoch: 5, batch: 300, loss is: [0.09791821], acc is [0.98]\n",
      "epoch: 5, batch: 400, loss is: [0.07187682], acc is [0.99]\n",
      "epoch: 6, batch: 0, loss is: [0.03127572], acc is [1.]\n",
      "epoch: 6, batch: 100, loss is: [0.06646111], acc is [0.99]\n",
      "epoch: 6, batch: 200, loss is: [0.06583631], acc is [0.99]\n",
      "epoch: 6, batch: 300, loss is: [0.04829451], acc is [0.99]\n",
      "epoch: 6, batch: 400, loss is: [0.02084539], acc is [1.]\n",
      "epoch: 7, batch: 0, loss is: [0.04691195], acc is [0.99]\n",
      "epoch: 7, batch: 100, loss is: [0.16179712], acc is [0.96]\n",
      "epoch: 7, batch: 200, loss is: [0.09541915], acc is [0.96]\n",
      "epoch: 7, batch: 300, loss is: [0.04241013], acc is [0.99]\n",
      "epoch: 7, batch: 400, loss is: [0.0865655], acc is [0.98]\n",
      "epoch: 8, batch: 0, loss is: [0.11079986], acc is [0.98]\n",
      "epoch: 8, batch: 100, loss is: [0.10248048], acc is [0.96]\n",
      "epoch: 8, batch: 200, loss is: [0.05022374], acc is [0.99]\n",
      "epoch: 8, batch: 300, loss is: [0.15147845], acc is [0.95]\n",
      "epoch: 8, batch: 400, loss is: [0.11052441], acc is [0.99]\n",
      "epoch: 9, batch: 0, loss is: [0.08408798], acc is [0.96]\n",
      "epoch: 9, batch: 100, loss is: [0.0684256], acc is [0.98]\n",
      "epoch: 9, batch: 200, loss is: [0.07009204], acc is [0.98]\n",
      "epoch: 9, batch: 300, loss is: [0.09384029], acc is [0.98]\n",
      "epoch: 9, batch: 400, loss is: [0.0384986], acc is [0.98]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#引入matplotlib库\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "with fluid.dygraph.guard(place):\n",
    "    model = MNIST(\"mnist\")\n",
    "    model.train() \n",
    "    \n",
    "    #四种优化算法的设置方案，可以逐一尝试效果\n",
    "    optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01)\n",
    "    \n",
    "    EPOCH_NUM = 10\n",
    "    iter=0\n",
    "    iters=[]\n",
    "    losses=[]\n",
    "    for epoch_id in range(EPOCH_NUM):\n",
    "        for batch_id, data in enumerate(train_loader()):\n",
    "            #准备数据，变得更加简洁\n",
    "            image_data, label_data = data\n",
    "            image = fluid.dygraph.to_variable(image_data)\n",
    "            label = fluid.dygraph.to_variable(label_data)\n",
    "            \n",
    "            #前向计算的过程，同时拿到模型输出值和分类准确率\n",
    "            predict, avg_acc = model(image, label)\n",
    "\n",
    "            #计算损失，取一个批次样本损失的平均值\n",
    "            loss = fluid.layers.cross_entropy(predict, label)\n",
    "            avg_loss = fluid.layers.mean(loss)\n",
    "            \n",
    "            #每训练了100批次的数据，打印下当前Loss的情况\n",
    "            if batch_id % 100 == 0:\n",
    "                print(\"epoch: {}, batch: {}, loss is: {}, acc is {}\".format(epoch_id, batch_id, avg_loss.numpy(),avg_acc.numpy()))\n",
    "                iters.append(iter)\n",
    "                losses.append(avg_loss.numpy())\n",
    "                iter = iter + 100\n",
    "\n",
    "            #后向传播，更新参数的过程\n",
    "            avg_loss.backward()\n",
    "            optimizer.minimize(avg_loss)\n",
    "            model.clear_gradients()\n",
    "\n",
    "    #保存模型参数\n",
    "    fluid.save_dygraph(model.state_dict(), 'mnist')\n",
    "\n",
    "    #画出训练过程中Loss的变化曲线\n",
    "    plt.figure()\n",
    "    plt.title(\"train loss\", fontsize=24)\n",
    "    plt.xlabel(\"iter\", fontsize=14)\n",
    "    plt.ylabel(\"loss\", fontsize=14)\n",
    "    plt.plot(iters, losses,color='red',label='train loss') \n",
    "    plt.grid()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### 使用tb-paddle可视化分析\n",
    "\n",
    "如果期望使用更加专业的作图工具，可以尝试tb-paddle。tb-paddle能够有效地展示飞桨框架在运行过程中的计算图、各种指标随着时间的变化趋势以及训练中使用到的数据信息。tb-paddle的使用也并不复杂，可分为如下四个主要步骤。\n",
    "\n",
    "１.　步骤1：引入tb_paddle库，定义作图数据存储位置（供第3步使用），本案例的路径是“log/data”。\n",
    "```\n",
    "from tb_paddle import SummaryWriter\n",
    "data_writer = SummaryWriter(logdir=\"log/data\")\n",
    "```\n",
    "２.　步骤2：在训练过程中插入作图语句。当每100个batch训练完成后，将当前损失作为一个新增的数据点(scalar_x和loss的映射对)存储到第一步设置的文件中。使用变量scalar_x记录下已经训练的批次数，作为作图的X轴坐标。\n",
    "\n",
    "```\n",
    "data_writer.add_scalar(\"train/loss\", avg_loss.numpy(), scalar_x)\n",
    "data_writer.add_scalar(\"train/accuracy\", avg_acc.numpy(), scalar_x)\n",
    "scalar_x = scalar_x + 100\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#引入Tensorboard库，并设定保存作图数据的文件位置\n",
    "from tb_paddle import SummaryWriter\n",
    "data_writer = SummaryWriter(logdir=\"log/data\")\n",
    "\n",
    "with fluid.dygraph.guard(place):\n",
    "    model = MNIST(\"mnist\")\n",
    "    model.train() \n",
    "    \n",
    "    #四种优化算法的设置方案，可以逐一尝试效果\n",
    "    optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.01)\n",
    "    \n",
    "    EPOCH_NUM = 10\n",
    "    iter = 0\n",
    "    for epoch_id in range(EPOCH_NUM):\n",
    "        for batch_id, data in enumerate(train_loader()):\n",
    "            #准备数据，变得更加简洁\n",
    "            image_data, label_data = data\n",
    "            image = fluid.dygraph.to_variable(image_data)\n",
    "            label = fluid.dygraph.to_variable(label_data)\n",
    "            \n",
    "            #前向计算的过程，同时拿到模型输出值和分类准确率\n",
    "            predict, avg_acc = model(image, label)\n",
    "\n",
    "            #计算损失，取一个批次样本损失的平均值\n",
    "            loss = fluid.layers.cross_entropy(predict, label)\n",
    "            avg_loss = fluid.layers.mean(loss)\n",
    "            \n",
    "            #每训练了100批次的数据，打印下当前Loss的情况\n",
    "            if batch_id % 100 == 0:\n",
    "                print(\"epoch: {}, batch: {}, loss is: {}, acc is {}\".format(epoch_id, batch_id, avg_loss.numpy(), avg_acc.numpy()))\n",
    "                data_writer.add_scalar(\"train/loss\", avg_loss.numpy(), iter)\n",
    "                data_writer.add_scalar(\"train/accuracy\", avg_acc.numpy(), iter)\n",
    "                iter = iter + 100\n",
    "\n",
    "            #后向传播，更新参数的过程\n",
    "            avg_loss.backward()\n",
    "            optimizer.minimize(avg_loss)\n",
    "            model.clear_gradients()\n",
    "\n",
    "    #保存模型参数\n",
    "    fluid.save_dygraph(model.state_dict(), 'mnist')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "３.　步骤3：命令行启动 tensorboard\n",
    "\n",
    "使用“tensorboard --logdir [数据文件所在文件夹路径] 的命令启动Tensor board。在Tensor board启动后，命令行会打印出可用浏览器查阅图形结果的网址。\n",
    "``` \n",
    "$ tensorboard --logdir log/data\n",
    "```\n",
    "\n",
    "４.　步骤4：打开浏览器，查看作图结果\n",
    "查阅的网址在第三步的启动命令后会打印出来（如TensorBoard 2.0.0 at http://localhost:6006/），将该网址输入浏览器地址栏刷新页面的效果如下图所示。除了右侧对数据点的作图外，左侧还有一个控制板，可以调整诸多作图的细节。\n",
    "\n",
    "\n",
    "<img src=\"https://ai-studio-static-online.cdn.bcebos.com/0f1926d02e4d49ff88ff088e65d11bc851d37db0a4d048b1b2725a206fb55152\" width=\"600\" hegiht=\"400\" align=center />\n",
    "\n",
    "图：tb-paddle的作图示例"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PaddlePaddle 1.6.0 (Python 3.5)",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
